[PERF] Pass-through multithreaded_io flag in read_parquet #1484

jaychia · 2023-10-11T17:00:17Z

Passes the multithreaded_io=False flag through when running on the Ray Runner for read_parquet

codecov · 2023-10-11T18:18:16Z

Codecov Report

Merging #1484 (fdb0849) into main (439f2bd) will increase coverage by 0.03%.
Report is 1 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1484      +/-   ##
==========================================
+ Coverage   74.86%   74.89%   +0.03%     
==========================================
  Files          60       60              
  Lines        6102     6102              
==========================================
+ Hits         4568     4570       +2     
+ Misses       1534     1532       -2

Files	Coverage Δ
daft/execution/execution_step.py	`92.30% <ø> (ø)`
daft/io/_parquet.py	`100.00% <100.00%> (+5.26%)`	⬆️
daft/table/table_io.py	`96.52% <ø> (+0.69%)`	⬆️

…thread (#1485) Updates default max_connections value from 64 to 8 Also renames `max_connections` in internal APIs to `max_connections_per_io_thread` to be more explicit, but keeps naming for external-facing APIs for backwards compatibility Note that the total number of connections being spawned for PyRunner is: `8.min(Num CPUs) * max_connections`, and theses are shared throughout the multithreaded backend The total number of connections being spawned for RayRunner after #1484 is: `num_ray_workers * 1 (sine we run single-threaded) * max_connections` --------- Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>

Pass-through multithreaded_io flag in read_parquet

f72df54

github-actions bot added the performance label Oct 11, 2023

Make function async and use a non-blocking read

fdb0849

jaychia requested review from samster25 and clarkzinzow October 11, 2023 18:24

jaychia merged commit a24e918 into main Oct 11, 2023
24 checks passed

jaychia deleted the jay/parquet-multithreaded-io branch October 11, 2023 18:26

jaychia mentioned this pull request Oct 11, 2023

[PERF] Update default max_connections 64->8 because it is now per-io-thread #1485

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF] Pass-through multithreaded_io flag in read_parquet #1484

[PERF] Pass-through multithreaded_io flag in read_parquet #1484

jaychia commented Oct 11, 2023

codecov bot commented Oct 11, 2023

[PERF] Pass-through multithreaded_io flag in read_parquet #1484

[PERF] Pass-through multithreaded_io flag in read_parquet #1484

Conversation

jaychia commented Oct 11, 2023

codecov bot commented Oct 11, 2023

Codecov Report